Progressive Duplicate Detection
نویسندگان
چکیده
منابع مشابه
A Study of Progressive Techniques for Efficient Duplicate Detection
---Databases contains very large datasets, where various duplicate records are present. The duplicate records occur when data entries are stored in a uniform manner in the database, resolving the structural heterogeneity problem. Detection of duplicate records are difficult to find and it take more execution time. In this literature survey papers various techniques used to find duplicate record...
متن کاملDuplicate document detection
In document image filing applications it is important to be able to recognize whether a particular document has already been entered into the system either as an individual document or as an inclusion in another document. Document images could be matched on the basis of layout or contents. However, matching of layout may not be effective when style is strictly controlled. We develop a document ...
متن کاملCluster Based Duplicate Detection
We propose a clustering technique for entropy based text dis-similarity calculation of de-duplication system. Improve the quality of grouping; in this study we propose a Multi-Level Group Detection (MLGD) algorithm which produces a most accurate group with most closely related object using Alternative Decision Tree (ADT) technique. Our propose a two new algorithm; first one is Multi-Level Group...
متن کاملIndustry-scale duplicate detection
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in many applications, including customer relationship management, personal information management, or data mining. In this paper, we present how a research prototype, namely DogmatiX, which was designed to detect duplica...
متن کاملParallel Structured Duplicate Detection
We describe a novel approach to parallelizing graph search using structured duplicate detection. Structured duplicate detection was originally developed as an approach to externalmemory graph search that reduces the number of expensive disk I/O operations needed to check stored nodes for duplicates, by using an abstraction of the search graph to localize memory references. In this paper, we sho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2015
ISSN: 1041-4347
DOI: 10.1109/tkde.2014.2359666